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Abstract 

We consider a finite horizon dynamic game with N selfish players who observe their types privately and 
take actions, which are publicly observed. Players’ types evolve as conditionally independent Markov processes, 
conditioned on their current actions. Their actions and types jointly determine their instantaneous rewards. Since 
each player has a different information set, this is a dynamic game with asymmetric information and there is no 
known methodology to find perfect Bayesian equilibria (PBE) for such games in general. In this paper, we develop a 
methodology to obtain a class of PBE using a belief state based on players’ common information. We first show that 
any expected reward profile that can be achieved by any general strategy profile can also be achieved by a policy 
based on players’ private information and this belief state. With this structural result as our motivation, we develop 
our main result that provides a two-step backward-forward recursive algorithm to find a class of PBE of this game 
that are based on this belief state. We refer to such equilibria as structured Bayesian perfect equilibria (SPBE). The 
backward recursive part of this algorithm defines an equilibrium generating function. Each period in the backward 
recursion involves solving a fixed point equation on the space of probability simplexes for every possible belief on 
types. Using this function, equilibrium strategies and beliefs are generated through a forward recursion. 


I. Introduction 

There are many practical scenarios where strategic players with different sets of observations are involved 
in a time-evolving dynamical process such that their actions influence each others’ payoffs. Such scenarios 
include repeated online advertisement auctions, wireless resource sharing, competing sellers and energy 
markets. In the case of repeated online advertisement auctions, advertisers place bids for locations on a 
website to sell a product. These bids are based on the value of that product, which is privately observed by 
an advertiser and past actions of everybody else, which are observed publicaly. Each advertiser’s goal is to 
maximize its reward, which depends on the value of the products and on the actions taken by everybody else. 
A similar scenario can be considered for wireless resource sharing where players are allocated channels that 
interfere with each other. Each player privately observes its channel gain and takes actions, which may be the 
choice of modulation and coding scheme and also the transmission power. The reward here is the rate each 
player gets at time t, which is a function of everyone’s channel gain and actions. Consider another scenario 
where different sellers compete to sell different but related goods which are complementary, substitutable 
or in general, with externalities. The true value of the goods is private information of a seller who, at 
each stage, takes an action to stock some amount of goods for sale. Her profit is based on some market 
mechanism (say through Walrasian prices) based on the true value of all the goods and their availability 
in the market, which depends on the actions of the other sellers. Each seller wants to maximize her own 
profit. Einally, a similar scenario also exists for energy markets where different suppliers (to their different 
end consumers) bid their estimated power outputs to an independent system operator (ISO) that forms 
the market mechanism to determine the prices assessed to the different suppliers. Each supplier wants to 
maximize its returns, which depend on its cost of production of energy, which is their private information, 
and the market-determined prices which depend on all the bids. 
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Such dynamical systems with strategic players are modeled as dynamie games. In dynamic games with 
perfeet and symmetrie information, subgame perfeet equilibrium (SPE) is an appropriate equilibrium eoneept 
[[TJ, Q, Q and there is a baekward reeursive algorithm to find all subgame perfeet equilibria of sueh games. 
Maskin and Tirole in 0 introdueed the eoneept of Markov perfeet equilibrium (MPE) for dynamie games 
with perfeet and symmetrie information where equilibrium strategies are dependent on some payoff relevant 
state of the system rather than on the entire history. However, for games with asymmetrie information, sinee 
players have different information sets in eaeh period, they need to form a belief on the information sets of 
other players, based upon whieh they prediet their strategies. As a result, SPE or MPE are not appropriate 
equilibrium eoneepts for sueh setting. There are several notions of equilibrium for sueh games, sueh as 
perfect Bayesian equilibrium (PBE), sequential equilibrium, trembling hand equilibrium 0, 0. Eaeh of 
these notions of equilibrium eonsists of a strategy and a belief profile of all players. The equilibrium 
strategies are optimal given the beliefs and the beliefs are derived from the equilibrium strategy profile and 
using Bayes’ rule (whenever possible), with some equilibrium eoneepts requiring further refinements. Due to 
this eireular argument of beliefs being eonsistent with strategies, whieh are in turn optimal given the beliefs, 
finding sueh equilibria is a diffieult task. Moreover, strategies are funetion of histories, whieh belong to an 
ever-expanding spaee, and thus the space of optimization also beeomes computationally intractable. There 
is no known methodology to find sueh equilibria for general dynamie games with asymmetrie information. 

In this paper, we eonsider a model where players observe their types privately and publiely observe the 
actions taken by other players at the end of eaeh period. Their instantaneous rewards depend on everyones’ 
types and aetions. We provide a two-step algorithm involving a baekward reeursion followed by a forward 
reeursion to eonstruet a elass of PBE for the dynamie game in eonsideration, whieh we eall structured perfect 
Bayesian equilibria (SPBE). In these equilibria, players’ strategies are based on their type and a set of beliefs 
on eaeh type whieh is eommon to all players and lie in a time-invariant spaee. These beliefs on players’ 
types form independent eontrolled Markov proeesses that together summarize the eommon information 
history and are updated individually and sequentially, based on eorresponding agents’ actions and (partial) 
strategies. The algorithm works as follows. In a baekward recursive way, for eaeh stage, the algorithm finds 
an equilibrium strategy funetion for all possible beliefs on types of the players whieh involves solving a 
fixed point equation on the spaee of probability simplexes. Then, the equilibrium strategies and beliefs are 
obtained through forward recursion by operating on the function obtained in the baekward step. The SBPEs 
that are developed in this paper are analogous to the MPEs for dynamie games with perfeet information in 
the sense that players ehoose their aetions based on beliefs that depend on eommon information and have 
Markovian dynamies, where aetions of a players are now partial funetions from their private information 
to their aetion sets. 

Related literature on this topie inelude 0, 0 and Q. Nayyar et al. in 0, 0 eonsider a model of 
dynamie games with asymmetric information. There is an underlying eontrolled Markov process where 
players jointly observe part of the proeess and also make some observations privately. It is shown in 0, 
0 that the eonsidered game with asymmetrie information, under eertain assumptions, can be transformed 
to another game with symmetric information. Once this is established, a baekward recursive algorithm is 
provided to find MPE of the transformed game, whieh are equivalently Nash equilibria of the transformed 
symmetrie information game. Eor this strong equivalenee to hold, authors in 0, 0 make a eritieal 
assumption in their model: based on the eommon information, a player’s posterior beliefs about the system 
state and about other players’ information are independent of the strategies used by the players in the past. 
Our model is different from the model eonsidered in 0, 0. We assume that the underlying state of the 
system has independent eomponents, eaeh constituting the type of a player. However, we do not make any 
assumption regarding update of beliefs and allow the eommon information based belief state to depend on 
players’ strategies. 

Ouyang et al. in 0 eonsider a dynamic oligopoly game with N strategic sellers of different goods and 
M strategie buyers. Eaeh seller privately observes the valuation of their good, whieh is assumed to have 
independent Markovian dynamies, thus resulting in a dynamie game of asymmetrie information. In eaeh 
period, sellers post priees for their goods and buyers make deeisions regarding buying the goods. Then a 
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public signal indicating buyers experience is revealed whieh depends on sellers’ valuation of the goods. 
Authors in Q eonsider a poliey-dependent oommon information based belief state based on whieh they 
define the eoneept of oommon information based equilibria. They show that for any given update funotion of 
this belief state, whieh is oonsistent with strategies of the players, if all other players play aetions based on 
this oommon belief and their private information, then player i faoes a Markov deoision prooess (MDP) with 
respeot to its aetion with state as oommon belief and its type. For every prior distribution, this defines a fixed 
point equation on belief update funetions and strategies of all players. They provide neeessary and suffieient 
eonditions for eommon information based strategy profile and belief update funetions to oonstitute PBE of 
the game; however they do no provide a systematie way to find sueh equilibria. In addition, beeause of the 
speeial struoture of the reward funotion, the problem admits a degenerate solution where agents’ strategies 
do not depend on their private information and therefore no signaling takes plaee. This allows existenoe of 
myopio, type-independent equilibrium polioies (although other equilibria may also exist). 

The paper is organized as follows. In seotion|^ we present our model. In seotion|^we present struotural 
results that serve as motivation for SPBE. In seetion 1^ we present the main result by providing a two-step 
baekward-forward reeursive algorithm to eonstruet a strategy profile and a sequenee of beliefs and show 
that it is a PBE of the dynamie game eonsidered. As an illustration, we apply this algorithm on a diserete 
version of an example from Q on repeated publie good game in Seetion |V| We eonelude in seetion |Vl 
All proofs are presented in Appendiees. 


A. Notation 

We use upperease letters for random variables and lowerease for their realizations. For any variable, 
subseripts represent time indiees and superseripts represent player identities. We use notation —i to rep¬ 
resent all players other than player i i.e. —i = {1, 2,... f — 1, f -f 1,..., A^}. We use notation At,t' to 
represent veetor (A^, A^+i,... A^/) when t' > t or an empty veetor if t' < t. We use A^* to mean 
(Aj, Aj,..., A)“^, Aj^^..., Af^) . We remove superseripts or subseripts if we want to represent the whole 
veetor, for example At represents (Aj,..., Af^). In a similar vein, for any eolleetion of sets (A’*)jg_v, "we 
denote Xjg^v'A’* by X. We denote the indieator funetion of any set A by Ia{-)- For any finite set S, V{S) 
represents the spaee of probability measures on S and |5| represents its eardinality. We denote by (or E^) 
the probability measure generated by (or expeetation with respeot to) strategy profile g. We denote the set 
of real numbers by M. For a probabilistio strategy profile of players (/9j)igA/' where probability of aetion a\ 
oonditioned on ai-,t-ix\.t is given by /3^*(aJ|ai:t_i, we use the short hand notation /)i“*(a7*|ai:t-i, a;^j) 
to represent equalities and inequalities involving random variables are to be 

interpreted in the a.s. sense. 


II. Model 

We eonsider a disorete-time dynamioal system with N strategio players in the set A/” = {1,2,.. .A^}, 
over a time horizon T = (1,2,...T} and with perfeet reoall. There is a dynamie state of the system 

Xt = where XI e A* is the type of player i at time t whieh is perfeotly observed 

and is its private information. Types of the players evolve as eonditionally independent, eontrolled Markov 
proeesses sueh that 

N 

= (la) 

i=l 

P{xt\xi-,t-i, = P{xt\xt-1, at-i) (lb) 

N 

= WQ\{Pt\Pt-i^<^t-i), (ic) 

i=l 
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where Q\ are known kernels. Player i at time t takes aetion a\ G A" on observing whieh is eommon 

information among players, and x\.^ whieh it observes privately. The sets are assumed to be finite. 

Let = {g\)t&r be a probabilistie strategy of player i where gl '■ A' ^ x (T”*)* —)■ V{A) sueh that player 

i plays aetion A\ aeeording to A\ ~ gl{-\ai-,t-i.iX\.^). Let g = {g'^)ie^f be a strategy profile of all players. 
At the end of interval t, player i reeeives an instantaneous reward R\xt,at). The objeetive of player i is 
to maximize its total expeeted reward 


ji.i ^ p 




t=i 


( 2 ) 


With all players being strategie, this problem is modeled as a dynamie game D with imperfeet and 
asymmetrie information, and with simultaneous moves. 


III. Motivation for structured equilibria 

In this seetion we present struetural results for the eonsidered dynamieal proeess that serve as a motivation 
for finding SPBE of the underlying game D. Speeifieally, we define a belief state based on eommon 
information history and show that any reward profile that ean be obtained through a general strategy profile 
ean also be obtained through strategies that depend on this belief state and player’s eurrent type whieh is 
its private information. These struetural results are inspired by the analysis of deeentralized team problems, 
whieh serve as guiding prineiples to design our equilibrium strategies. While these struetural results provide 
intuition and the required notation, they are not direetly used in the proofs for finding SPBEs, later, in 
Seetion |IVl 

At any time t, player i has information (ai.t-i, x\.^) where ai.t-i is the eommon information among 
players, and x\.j- is the private information of player i. Sinee x\.t) inereases with time, any strategy 

of the form Al ~ gl{-\ai-,t-i, x\.j.) beeomes unwieldy. Thus it is desirable to have an information state in 
a time-invariant spaee that sueeinetly summarizes x\.-i.) and that ean be sequentially updated. We 

first show in Eaet that given eommon information and its eurrent type xl, player i ean diseard 

its type history x\.j-_i and play a strategy of the form xj). Then in Eaet we show that 

ean be summarized through a belief ttu defined as follows. Eor any strategy profile g, belief iit 
on Xt, TTt G V{X), is defined as = P^{Xt = Xt\ai.,t-i) 'ixt G X. We also define the marginals 

jr’W) = Vij e A-. 

Eor player i, we use notation g to denote a general poliey of type A], ~ notation s 

where s\ : x Tf* —)■ V{A) to denote a poliey of the form sl{al\ai.,t-i, x]) and notation m where 

ml : V{Xi(zj^X'^) X Tf* —)■ V{A^) to denote a poliey of the form ml{al\7rt,xl). It should be noted that sinee 
TTt is a funetion of random variables m poliey is a speeial type of s poliey, whieh in turn, is a speeial 

type of g poliey. 

Using the agent-by-agent approaeh 0, we show in Eaet that any expeeted reward profile of the players 
that ean be aehieved by any general strategy profile g ean also be aehieved by a strategy profile s. 

Fact 1: Given a fixed strategy g~'' of all players other than player i and for any strategy g'' of player i, 
there exists a strategy s* of player i sueh that 

a*) = P^^^~\xt, at) Vf G r, Xi G A’, G (3) 

whieh implies * = j^g'-g \ 

Proof: See Appendix ■ 

Sinee any s® poliey is also a (7® type poliey, the above faet ean be iterated over all players whieh implies 
that for any g poliey profile there exists an s poliey profile that aehieves the same reward profile i.e. 

Polieies of types s still have inereasing domain due to inereasing eommon information, In order to 

summarize this information, we take an equivalent view of the system dynamies through a eommon agent. 
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as taken by Nayyar et al. in Q. The eonunon agent approaeh is a general approaeh that has been used 
extensively for dynamie team problems [ 10|-p3|. Using this approaeh, the problem ean be equivalently 
deseribed as follows: player i at time t observes ai.,t-i and takes aetion 7 ^, where 7 ^* : —)■ V{A^) is a 

partial (stoehastie) funetion from its private information xl to aj of the form 7 j(aJ|xJ). These aetions are 
generated through some poliey - 0 * = i’l ■ that operates on the eommon 

information so that At = i’lWi-.t-i]- Then any poliey of the form A] ~ sl{-\ai-,t-i, xl) is equivalent to 

We eall a player i’s poliey through eommon agent to be of type A if its aetions At taken as 
At = AtV‘'i.t-i]- We eall a player z’s poliey through eommon agent to be of type 0* where 9) : V{X) — >■ 
—>■ V{A')}, if its aetions At taken as At = ^ poliey of type 6 ** is also a poliey of type A- 

There is a one-to-one eorrespondenee between polieies of type s® and of type A between polieies of 
type m® and of type 0 ®. 

In the following faet, we show that the spaee of profiles of type s is outeome-equivalent to the spaee of 
profiles of type m. 

Fact 2: For any given strategy profile s of all players, there exists a strategy profile m sueh that 


P""{xt, at) = PAxt, at) Wt eT,xt e T’, at e A, 


whieh implies (J® 


)i£jV 


— {P'^)ieJV- Furthermore Tit can be faetorized as 7it{xt) = 


Tinx 


A ean be updated through an update funetion 


(4) 

where eaeh 

(5) 


A+i = F{A,AtAt), 

where F is independent of s. 

Proof: See Appendix ■ 

The above two faets show that any reward profile that ean be generated through poliey profile of type 
g ean also be generated through poliey profile of type m. It should be noted that the eonstruetion of s®, 
as in (28d), depends only on g\ while the eonstruetion of m® depends on the whole poliey profile g and 
not just on g\ sinee eonstruetion of 0® depends on xjj in (40). Thus any unilateral deviation of player f in 
poliey profile does not neeessarily translate to unilateral deviation of player i in the eorresponding m poliey 
profile. Therefore g being an equilibrium of the game (in some appropriate notion) does not neeessitate the 
eorresponding m also being an equilibrium. 

As shown in the previous faets, due to the independenee of types and their evolution as independent 
eontrolled Markov proeesses, for any strategy of the players, joint beliefs on types ean be faetorized as 
produet of their marginals i.e. M^t) = Sinee in this paper, we only deal with sueh joint beliefs, 

to aeeentuate this independenee strueture, we define G Xi^P{X^) as veetor of marginal beliefs where 
Tit •= the rest of the paper, we will use tt^ instead of Tit whenever appropriate, where of eourse vr^ 

ean be eonstrueted from ^l^. Similarly, we define veetor of belief updates as F( 7 r, 7 , a) := {FA'-, 7 ®, a))i(zjg-. 
We also ehange the notation of polieies of type m as ml : Xi^V{X') x A”® —)■ V{A') and eommon agent’s 
polieies of type 9 as 91 : Xi^V{X') —)■ {A® —)■ V{A')}. 

We end this seetion by noting that finding general PBEs of type g of the game D would be a desirable 
goal, but due to the spaee of strategies growing exponentially with time, that would be eomputationally 
intraetable. However Faet 1 suggests that strategies of type m form a elass that is rieh in the sense that 
they aehieve every possible reward profile. Sinee these strategies are funetions of beliefs Tit that lie in a 
time-invariant spaee and are easily updatable, equilibria of this type are potential eandidates for eomputation 
through baekward reeursion. In this paper our goal is to devise an algorithm to find struetured equilibria 
of type m of the dynamie game D. 


IV. Algorithm for SPBE computation 

A. Preliminaries 

Any history of this game at whieh players take aetion is of the form ht = xi;*). Eet Fit be 

the set of sueh histories of the game at time t when players take aetion, FL = uJ^QFLt be the set of all 
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possible such histories. At any time t player i observes h\ = x\.^) and all players together have 

as common history. Let Til be the set of observed histories of player i at time t and 
be the set of common histories at time t. An appropriate concept of equilibrium for such games is PBE 
1^, which consists of a pair (/3*,/i*) of strategy profile (3* = {(3t’^)teT,ieJV where (3^^ : T-i\ —)■ V{A') 
and a belief profile /r* = where */it • 'Pi'Ht) that satisfy sequential rationality so that 

yi e N‘,t e T, hi e HI, 


> A„)|/ii|, (6) 

and the beliefs satisfy some consistency conditions as described in [[^ p. 331]. In general, a belief for 
player i at time t, is defined on history ht = {ai.t-i, Xi.t) given its private history hi = {ai.t-i, x\.^). 
Here player i’s private history hi = x\.^) consists of a public part h^ = and a private part 

x\.^. At any time t, the relevant uncertainty player i has is about other players’ type xl~\ In our setting, due 
to independence of types, player i’s current type x] does not provide any information about as will be 
shown later. For this reason we consider beliefs that are functions of each agent’s history hi only through 
the common history h^. Hence, for each agent i, its belief for each history h^ = ai-t-i is derived from a 
common belief fil[ai.,t-i] which itself factorizes into a product of marginals YljeAf will be 

shown later. Thus we can sufficiently use the system of beliefs, with —)■ 'P(A), with 

the understanding that agent i’s belief on x]r* is /i^’~*[ai:f_i](x7*) = Wi-.t-ilixl). Under the above 

structure, all consistency conditions that are required for PBEs ||^ p. 331] are automatically satisfied. 

Structural results from Section [in| provide us motivation to study equilibria of the form x]))igA/'> 

which are equivalent to policy profiles of the form {6l[^^^\{al\xl))iQJ\f and have the advantage of being defined 
on a time-invariant space. 




, n=t 




B. Backward Recursion 

In this section, we define an equilibrium generating function 6 = where 61 : —>■ 

{fP* —)■ V{A^)} and a sequence of functions (I^*)igA/',te{i, 2 ,...T+i}, where U/ : x fP* —)■ M, in a 

backward recursive way, as follows. 

1. Initialize ^tLt+i ^ ^r+i ^ 



At+Ii 


^T+l) — 0 - 


(7) 


2 . 


For f = T,T - 1,... 1, VvTi e let 6 ^ 4 [vrj be generated as follows. Set 

It = 6t[TLi\, where 7 * is the solution, if it exist^ of the following equation. Mi G TV, x] G TP*, 


7j(-|x]) G arg max 


{R'(X„A,) + V,7(Ffc,7„A)..V+i)|2^;} . 


( 8 ) 


where expectation in ([^ is with respect to random variables (X^“*, At, X^i) through the measure 
7 rt“*(xtr*) 7 *(a]|x]) 7 t"*(a 7 *|x 7 *)Q]_,_t(x]_,_t|x], Oi) and F is defined in the proof of Fact and in partic¬ 
ular Claim |5] 

Furthermore, set 


V','fc.4) = + V'i.(F(2r„7«, A),XVi)|4} ■ 


(9) 


It should be noted that in ([^, Ml is not the outcome of the maximization operation as in a best response 
equation similar to that of a Bayesian Nash equilibrium. Rather ([^ has characteristics of a fixed point 
equation. This is because the maximizer Ml appears in both, the left-hand-side and the right-hand-side of 


'similar to the existence results shown in in the special case where agent i’s instantaneous reward does not depend on its private type 
xl, the fixed point equation always has a type-independent solution yl(') since it degenerates to a hest-response-like equation. 
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the equation. This distinet eonstruetion allows the maximization operation to be done with respeet to the 
variable for every x\ separately as opposed to be done with respeet to the whole funetion 74 (-|-)> 

and is pivotal in the eonstruetion. 

To highlight the signifieanee of strueture of ([^, we eontrast it with two alternate ineorreet eonstruetions. 
(a) Following the eommon information approaeh as in deeentralized team problems 0, instead of Q, 
suppose were eonstrueted as equilibrium on eommon agents’ aetions 74 , i.e. for a fixed vr^, tt* = 


7 ; € argmaxE'’:’-^'-" {R(X„A,) + V;7(F(7„77-‘, A), A'j+i)} ■ 


( 10 ) 


(b) 


It should be noted that in (10), the argument of the maximization operation, 7 ^, appears both, in 
generation of aetion A\ and in the update of the belief tt*. Moreover, (10) is not eonditioned on x\, the 
private information of player i, similar to the ease in the eorresponding team problem. This is beeause 
the eommon agent who does not observe the private information of the player i, averages out that 
information. While this averaging of private information works for the team problem whose objeetive 
is to maximize the total expeeted reward, for the ease with strategie players, it is ineompatible with 
the sequential rationality eondition in (|^, whieh requires eonditioning on the entire history xl) 

and not just the eommon information 

If the private information is also eonditioned on, the eonstruetion still remains invalid, as diseussed 
next. 

Instead of d^, suppose Yt were eonstrueted as best response of player i to other players aetions 7 ;“*, 


similar to a standard Bayesian Nash equilibrium. For a fixed = 


A 




{mx,. A,) + v;7(F(7„7;7r‘, .i,), a7i)|x5} . 


( 11 ) 


Then 7 ) would be a funetion of 7 ^ * and xl through a best response relation 7 ) G *), where 

J-1 ^ 

BR\ is appropriately defined from (11). Consequently, every eomponent of the solution of the fixed 

1_I 

point equation (7) G if b existed, would be a funetion of the whole type profile 


Xt, resulting in a mapping Yt = Sinee player i only observes its own type x], it would not 

be able to implement the eorresponding 7 ) and therefore the eonstruetion would be invalid. 


C. Forward Recursion 


As diseussed above, a pair of strategy and belief profile is a PBE if it satisfies (|^. Based on 

9 defined above in 0-0, we now eonstruet a set of strategies f3* and beliefs Y for the game D in a 
forward reeursive way, as followj^ As before, we will use the notation ■= where 


Y[ai:t-i\ can be eonstrueted from as = Y[i=i G where 

is a belief on x]. 

1. Initialize at time t = 1, 


N 




( 12 ) 


2=1 


2. For f = 1,2 ... T, Vf G A/", ai-^t 




G {XY 


l3Y{a\\ai,t-i,x\.Y = l3Y{a\\ai-.t-ux\) := ei[Y^[ai,t-i]]{a\\x\ 


(13) 


^ As discussed in starting of Section 
are the same for all agents. 


IV 


beliefs at time t are functions of each agent’s history hi only through the common history hi and 
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and 




( 14 ) 


where F is defined in the proof of Faet and in partieular Claim 
We now state our main result. 

Theorem 1: A strategy and belief profile eonstrueted through baekward/forward reeursion al¬ 

gorithm deseribed in seetion IV is a PBE of the game, i.e. Vi G A/", f G 7~, di-t-i G 'Hj, x\.j- G /)*, 


T 




n=t 


n=t 


(15) 


Proof: See Appendix ■ 

An intuitive explanation for why all players are able to use a eommon belief is the following. The sequenee 
of beliefs defined above serve two purposes. First, for any player i, it puts a belief on to eompute an 
expeetation on the eurrent and future rewards. Seeondly, it prediets the aetions of the other players sinee 
their strategies are funetions of these beliefs. Sinee for any strategy profile, x\ is eonditionally independent 
of xj"* given the eommon history and sinee other players do not observe x\, knowledge of x\ does 

not affeet this belief and thus in our definition, all players ean use the same belief jj* whieh is independent 
of their private information. 

Independenee of types is a erueial assumption in proving the above result, whieh manifests itself in 
Lemma in Appendix used in the proof of Theorem This is beeause, at equilibrium, player i’s 
reward-to-go at time t, eonditioned on its type x], depends on its strategy at time t, yy only through its 
aetion a) and is independent of the eorresponding partial funetion /)^*(-|ai:f_i, ■). In other words, given xj 
and a), player i’s reward-to-go is independent of We diseuss this in more detail below. 

At equilibrium, all players observe past aetions and update their belief Tit, whieh is the same as 

through the equilibrium strategy profile /?*. Now suppose at time t, player i deeides to unilaterally 
deviate to at time t for some history keeping the rest of its strategy the same. Then other players still 
update their beliefs {'7rt)te{t+i,...T} same as before and take their aetions through equilibrium strategy 
operated on Tit and xf\ whereas player i forms a new belief fit+i on Xt whieh depends on strategy profile 
/5i.j_i/3t, f3t~\ Thus at time t player i would need both the beliefs vr^+i, to eompute its expeeted future 
reward; nt+i to prediet other players’ aetions and Tit+i to form a true belief on xt based on its information. 
As it turns out, due to independenee of types, lit+i does not provide additional information to player i to 
eompute its future expeeted reward and thus it ean be disearded. Intuitively, this is so beeause the belief on 
type j, tt A i is a funetion of strategy and aetion of player j till time t (as shown in Claim 1 in the proof of 
Theoremj^in Appendix]^; thus Tiff-^ = Tiff^. Now sinee player i already observes its type xj, its belief ttJ 
on x) does not provide any additional information to player i, and thus Tit (whieh is the same as /ij'[ai:t_i]) 
suffieiently eomputes future expeeted reward for player i. Also Tit+i is updated from Tit, A*') and 
at, and is independent of (51 given aj. This implies player i ean use the equilibrium strategy (5( to update 
its future belief, as used in ([^. Then by eonstruetion of 9 and speoifieally due to ([^, player i does not 
gain by unilaterally deviating at time t keeping the remainder of its strategy the same. 

Finally, we note that in the two-step baekward-forward algorithm deseribed above, onee the equilibrium 
generating funetion 9 is defined through baekward reeursion, the SPBEs ean be generated through forward 
reeursion for any prior distribution Q on types X. Sinee, in eomparison to the baekward reeursion, the 
forward reeursive part of the algorithm is eomputationally insignifieant, the algorithm eomputes SPBEs for 
different prior distributions at the same time. 

In the next seetion, we diseuss an example to illustrate the methodology deseribed above for the eon¬ 
struetion of SPBEs. 
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V. Illustrative example: A two stage public goods game 

We consider a discrete version of Example 8.3 from [|^ ch.8], which is an instance of a repeated public 
good game. There are two players who play a two period game. In each period t, they simultaneously decide 
whether to contribute to the period t public good, which is a binary decision aj G {0,1} for player i = 1,2. 
Before the start of period 2, both players know the actions taken by them in period 1. For both periods, each 
player gets reward 1 if at least one of them contributed and 0 if none does. Player i’s cost of contributing 
is X* which is its private information. Both players believe that x*s are drawn independently and identically 
with probability distribution Q with support 0 < < 1 < x^, such that P^{X^ = x^) = q 

where 0 < g < 1. 

This example is similar to our model where N = 2,T = 2 and reward for player i in period t is 


R\x,at) = 


if aj = 0 
X* if al = 1. 


(16) 


We will use the backward recursive algorithm, defined in Section IV, to find an SPBE of this game. For 
period t = 1,2 and for i = 1 , 2 , the partial functions 7 ^ can equivalently be defined through scalars 
and pI^ such that 7 j(l|x^) = pl^, 7 i( 0 |x^) = 1 — pf^ and 7 j(l|x^) = p\^, 7 i( 0 |x^) = 1 — p\^, where 
p\^,p\^ G [0,1]. Henceforth, we will use pf" and p\^ interchangeably with the corresponding 7 ^. 

For t = 2 and for any fixed 'k _2 = 7 r|), where ^ [0^ 1] represents a probability measure 

on the event {X* = x^}, player Fs reward is 

E^H^ 2 (^,^ 2 )| 7 r 2 ,X* = x^} = (1 ((1 - +P 2"'(1 - a;"-), (17a) 

A2)|7r2, X* = x^} = (1 - pf) ((1 - 7r2-*)P2"‘^ + vr 2 -‘P 2 -‘"') + pf (1 - x^). (17b) 

Fet 72 = 6*2 [ 7 : 2 ] and equivalently {p\^,pi^,p\^,pi^) = 6^2 [ 7 r 2 ] be defined through the following fixed point 
equation, which is equivalent to ([^. For i = 1, 2 


~iL 


P 2 G argmax (1 - pf) ((1 - *)P 2 + 7^2 *P 2 + P 2 ^(l “ x^), 

(1 - pf) ((1 - + 'X2'pff + pff - X^). 


rt^L 

P 2 

pf G argmax 

r^H 

P 2 


(18a) 

(18b) 


Since 1 —x^ < 0, P 2 = 0 achieves the maximum in (18b). Thus (18a)-( 18b) can be reduced to, Vi G {1, 2} 


p ^2 ^ argmax 
pf 


This implies. 



(1-P 2 )(1 - 7r2*)P2* +P2(l-a^)- 


X^ > 1 - (1 - 7lf")p2'^, 
x-^ < 1 - (1 - 7r^*)p2 
x-^ = 1 - (1 - 7rf")p2'^. 


(19) 


( 20 ) 


The fixed point equation ( [20] ) has the following solutions, 

1) {pf,pf:Pf,pf) = (0,1,0,0) for ttI G [0,1], tt^ < x^ 

• F2^(7r2,x-^) = 1 - 7r| 

• vf{712, X^) = 1-71^ 

• V2f2, X^) = 1 — X^ 

• F2^(7r2,x^) = 0 . 

{p\^,pf ,pf ,pf) = (1,0, 0,0) for Til < x^,'kI G [0,1] 

• 14 ^ (t 2 , X^) = 1 — X^ 

• Vf{'K2,X^)=Q 

• Vf{7r2, x^) = 1 - ttI 


2 ) 
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• V^{7r2,x^) = 1 - Til 

3) = (1,1, 0,0) for Til > x^.'kI > x^ 

• l 2 ^(vr 2 , x^) = 1 — x^ 

• vf{-K2,x^) = 1-nl 

• V" 2 ^(vr 2 , x^) = 1 — x^ 

• V^2^(vr2,a;^) = 1 - vr^. 

4) {P\^,Pl^,P\^,Pl^) = (l,pi^,0,0) for nl = x^.nl G [0,1] where pf G 

• x^) = I — x^ 

• Vf{7r2,x^) = l- 7rlpl^ 

• l2^(7r2, x^) = 1 — x^ 

• ^^ 2 , 0 ;^) = 1 - vr^. 

5) {Pl^,Pl^,Pl^,Pl^) = {Pl^, 1, 0, 0) for ttI G [0,1], TT^ = x^ where pl^ G 


• 12^2,3;'^) = 1 - vr^ 

• l2^(7r2, x^) = 1 — x^ 

• ^V 2 ,a;'^) = 1 - 


6) {p. 


h^Pl\Pl^.P¥) = (^,^,0,0) for nl<x^, 


• l2^(7r2, x^) = 1 — x^ 

• l2^(7r2, x^) = 1 — x^ 

• V-^ItTz, X^) = 1 — x^ 

• V-^ItTz, X^) = 1 — x^. 

Figure shows these solutions in the spaee of ( 


’^2 5 ”^2 J 


TT^ < X^ 


0, max 


0, max 


l-x^ 

1 —TTo 


1 , 


] 



Fig. 1: Solutions of fixed point equation in (20) 


Thus for any 7r2, there ean exist multiple equilibria and eorrespondingly multiple 6^2 [vr2] ean be defined. 
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For any particular 6*2, at t = 1, the fixed point equation that needs to be solved is of the form, Vi G {1,2} 

(1 - pf) ((1 - + E^^{n{F{Q^, % (0, A^^)),x^)}) 

+ pf {1-x^ + E^^{V^{F{Q^%{l,Ai^)),x^)}) . (21a) 

(1 - pf) ((1 - g)pr*^ + + E^^M{F{Q^, 71 , (0, Ar)),x^)}) 


p^i G argmax 


G argmax 

p{H 


+ pf {l-x^ + E^^{V^{F{Q^ 71, (1, ^rO), ^^)}) 
where F{Q\ 7, A^)) = F{Q, A^)F{Q, 7^, and 

q{l - pf) 


FiQ,%0) = 

m 7 i,i) = 


qf-Pf) + (1 - q)f-Pf)' 


-iff 

qPi 


(21b) 

(22a) 

(22b) 


qpf + (1 — q)Pi^ ’ 

if the denominators in (22a)-(22b) are strietly positive, else F{Q, 7 I, A'' ) = Q as in the proof of Faet 
and in partieular ClaimA solution of the fixed point equation in ( 21a[ )-(21b) defines 6*1 [Q^]. 

Using one sueh 6 defined as follows, we find an SPBE of the game for q = 0.1, = 0.2, = 1.2. We 

use 6*2[71:2] as one possible set of solutions of (20), shown in Figure]^ and deseribed below. 


6'2[7r2] 



^ 1 —7r2 ’ 1—7 

( 1 , 0 , 0 , 0 ) 

( 0 , 1 , 0 , 0 ) 

, ( 1 , 1 , 0 , 0 ) 


Til G [0,x^),7r| G [0,x^) 
vr} G [0,x^],7r^ G [x^,l] 
vr} G [x^,l],7r^ G [0,x^] 
Til G (x^, l],7r| G (x^, 1]. 


(23) 



Then, through iteration on the fixed point equation ( |21a )-(21b) and using the aforementioned 6^2[712], we 
numerieally find (and analytieally verify) that 9i[Q‘^] = {p\fpf,pf,pf) = (0,1, 0,0) is a fixed point. 




























12 


Thus 


(3\{A\ = 1|X^ = = 0 l3l{Al = 1|X2 = x^) = l 

I3\{A\ = l|Xi = x^) = 0 PliAl = 1|X2 = x^) = 0 

with beliefs /i*[00] = (g, 1),//^[Ol] = (g, 0),/i*[10] = (g, 1),/i;[ll] = (g, 0) and ■))ie{i,2} = 

6*2[/i2[oi]] is an SPBE of the game. In this equilibrium, player 2 at time f = 1, contributes according to her 
type whereas player 1 never contributes, thus player 2 reveals her private information through her action 
whereas player 1 does not. Since 62 is symmetric, there also exists an (antisymmetric) equilibrium where at 
time t = 1 , players’ strategies reverse i.e. player 2 never contributes and player 1 contributes according to 
her type. We also obtain a symmetric equilibrium where 9i[Q‘^] = ( 0, 0) as a fixed 
point when x^ > resulting in beliefs /iaiOO] = (p,p),= (p, 0),= (0,p),/i2[ll] = (0,0) 
where p = 


VI. Conclusion 

In this paper, we study a class of dynamic games with asymmetric information where player i observes its 
true private type x\ and together with other players, observe past actions of everybody else. The types of the 
players evolve as conditionally independent, controlled Markov processes, conditioned on players current 
actions. We present a two-step backward-forward recursive algorithm to find SPBE of this game, where 
equilibrium strategies are function of a Markov belief state tt^, which depends on the common information, 
and current private types of the players. The backward recursive part of this algorithm defines an equilibrium 
generating function. Each period in backward recursion involves solving a fixed point equation on the space 
of probability simplexes for every possible belief on types. Then using this function, equilibrium strategies 
and beliefs are defined through a forward recursion. 

In this paper we consider perfectly observable, independent dynamic types of the players. Euture work 
includes considering types of players where players do not perfectly observe their types, rather they make 
noisy observations. In general, this methodology opens the door for finding PBEs for many applications, 
analytically or numerically, which was not feasible before. One such case would be dynamic EQG games 
where types evolve linearly with Gaussian noise and players incur quadratic cost. 
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Appendix A 
Proof of Eact[I] 

We prove this Eact in the following steps. 

(a) In Claim we prove that for any policy profile g and Vf G T, x\.^ for i E M are conditionally 
independent given the common information a^t. 

(b) In Claim|^ using Claim[^ we prove that for every fixed strategy of the players —i, xj), 

is a controlled Markov process for player i. 

(c) Eor a given policy g, we define a policy s* of player i from g as xl) = P^{al\ai.,t-i, x]). 

(d) In Claim 1^ we prove that the dynamics of this controlled Markov process ((xj, a^t-i under 

(s*g“*) are same as under g i.e. "(^xl, xl_^_i, a^t) = P^ix], axt). 

(e) In Claim we prove that w.r.t. random variables {xt,at), x\ is sufficient for player i’s private 
information history x\.^ i.e. P^{xt.,at\axt-i,x\.^a\) = P^ *(xt, at|ai:t_i, xj, aj). 

(f) Erom (c), (d) and (e) we then prove the result of the Eact that P^^A '(^xt,at) = P^{xt,at). 
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Claim 1: For any policy profile g and \/t. 


N 


(25) 


2=1 


Proof: 

P^ {xifai-.t-i) = 


n»=l (Qi(3^i)^i(414)111=2 Qn(<kLl)Qn-l)^;(<|ai:n-l,Pl:n)) 

Exi:* n*=l {Q"{x\)g\{a\\x\) 111=2 Qiixi\Pn-l,an-l)gU<\ai:n-l,Pl:n)) 


nil {Q\ix\)gi{a{\ 

\x\ 

)nl=2<5i(<i 

1 

\^n 

-l,an-l)9i{<\ai:n-l,x\,j) 

nil CExi.^qkx\)9\{ 

«i 

i^i)nL2 QU 

/v>2 1 

\Pn-l, an-l)4«l«l:rx-l, Pl:n)) 


N 

n 

2 = 1 

N 


Q{{x\)g{{a{\ 

\x\ 

)nl=2 Qni<\ 

1 'T*^ 

\^n 

-l,an-l)c/l(<|ai:n-l,PlJ 

Exi.,<5*(^i)^i( 

«il 

i^i)nl=2 Qiii 

rfi 1 

^n-1) ®n-l)5'n®n ^l:n) 




(26a) 

(26b) 

(26c) 

(26d) 

(26e) 


2 = 1 


Claim 2: For a fixed g~^, {{ai:t-i,xl),al}t is a controlled Markov process with state (ai:t_i,a;)) and 
eontrol action a]. 

Proof: 

P^ifi-.ti x^_^_l\al■t-l, x^.-j-, CL^.f) 

= ^ ^ P^(di:i, x^_^_i, Xifai-^t-iiXi.^ af) 

= P^{a-\ x^fai,t-i, x\.^, a*) a\) 

= \ \\9l{al\ai-.t-i,x\.f) j Q*(a;J+i|a;*, a*, di"*)/(„^^^_^,„j)(di:t_i, d*) 

= P3~\ai,t,x\^fai,t-i,x\,a\), 

where (27c) follows from Claim since is conditionally independent of x\.^ given ai.t- 
corresponding probability is only a function of g~^. 

For any given policy profile g, we construet a poliey s* in the following way, 

A 


sl{al\ai,t-uxl) = P>^{al\ai,t-i,xl) 


Ex* , P^\x{.fai,t.i)gi{ai\ai,t-i,xi^) 


= P5*(a*|ai:t_i,a;*), 

where dependenee of (28c) on only g^ is due to Claim 


(27a) 

(27b) 

(27c) 

(27d) 

1 and the 

■ 

(28a) 

(28b) 

(28e) 

(28d) 
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Claim 3: The dynamics of the Markov process under {s''g *) are the same as under g 

i.e. 


ps 


'\xl, xl^^, ai,t) = P^xl, xl^^, ai,t) Vt 


(29) 


Proof: We prove this by induction. Clearly, 


= P^^<^-\x\) = Q\{P,) 


(30) 


Now suppose (29) is true for t — 1 which also implies that the marginals P^{x\^ = P^ ^ (xj, 

Then 

P%xl,ai.,t-i,xl^^,at) = P>^{x],ai:t-i)P%al\ai.,t-i,xl)P’^{xl^^,ai.,t\xl,ai,t-i,al) (31a) 

= '(x*,ai;t_i)s*(a*|ai:t_i,x*)P^ '(x*+i, ai:t|x*, a*) (31b) 

= P^‘^ '(xJ,ai:t_i,Xi+i,at) (31c) 


where (31b) is true from induction hypothesis, definition of s* in (28d) and since {(ai:j_i,xj),aj}* is a 
controlled Markov process as proved in Claim and its update kernel does not depend on policy ^fkThis 
completes the induction step. 


Claim 4: For any policy g, 

P^{xt,h\al■,t-l,x\.^,a\) = P® \xt,h\ai-.t-i,x\,a\) 


(32) 


Proof: 


Now 


P^(xi, at\al,t-l,x\.^, al) = 4i,aj(^o al)^^(4 \ 4:t) 

P4x4,d->i:t_i,x4) = 

r~^ 

x 7 .i \j 4 i 


= P^-\xf\d;^\a^.,-l) 


where P4c| ) follows from Claim [T] 
Hence 


P^(xt,at|ai:i_i,xl^t,ai) = I^.^ai{xl,al)P^ “(x^, a/|ai:t_i) 

= P® ‘(xi,dt|ai:i-i,4)«D 


(33) 

(34a) 

(34b) 

(34c) 

(34d) 


(35a) 

(35b) 
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Finally, 


P^{xt,at)= ^ P^{xt,at\ai-t-i,x\.^al)P^{ai-t-i,x\.j.,al) 

O'1: t — 1 ^ 1. t *3.^ 

= ^ \xt,at\ai-t-i,xl,a])P\ai.,t-i,x\.^al) 

O'1: t — 1 ^ 1. j 5 *3.^ 

= ^ F^ \xt,at\ai-t-i,xl,a])P%ai:t-i,xl,al) 

= at\ai.,-i, xl, ai)F**"-'(ai:i_i, x*, a*) 

ai,t-ixl,al 

= P^^^-\xt,~at). 


where P6b[ ) follows from ( |3^ in Claim and P6d[ ) from p9l ) in Claim 


(36a) 

(36b) 

(36e) 

(36d) 

(36e) 


Appendix B 
(Proof of Fact[^ 


For this proof we will assume the eommon agents strategies to be probabilistie as opposed to being 
deterministie, as was the ease in seetion III This means aetions of the eommon agent, jfs are gen¬ 


erated probabilistieally from -0* as V\ ■ '0J(-|ai:t_i), as opposed to being deterministieally generated as 
7^* = as before. These two are equivalent ways of generating actions a\ from ai.t-i and x\. We 

avoid using the probabilistic strategies of common agent throughout the main text for ease of exposition 
and because it conceptually does not affect the results. 

Proof: 


We prove this Fact in the following steps. We view this problem from the perspective of a common 
agent. Let ip be the coordinator’s policy corresponding to policy profile g. Let ttKxI) = P'^\xl\ai-,t-i). 

(a) In Claimwe show that tt* can be factorized as 7rt{xt) = where each wl can be updated 

through an update function = F(7r),7^,at) and F is independent of common agent’s policy ip. 

(b) In Claim 1^ we prove that (11^, Tt)ter is a controlled Markov process. 

(c) We construct a policy profile 6 from g such that Otid'ytlnt) = P'^{d'yt\nt). 

(d) In Claim 1^ we prove that dynamics of this Markov process (lit, Tt)teT under 6 is same as under f 

i.e. P^{d7rt,d'jt,d7rt+i) = P'^{d7it,d'jt,d7rt+i)- 

(e) In Claim we prove that with respect to random variables (X*, At), Tit can summarize common 
information ai:t-i i.e. F’^(a:t, at|ai:t-i, 7t) = F(a:t, atlvTt, 7t). 

(f) From (c), (d) and (e) we that prove the result of the Fact that P'^{xt,at) = P^{xt,at) which is 
equivalent to P^{xt, at) = P'^{xt, at), where m is the policy profile of players corresponding to 9 . 


Claim 5: vrt can be factorized as Tit{xt) = ’^ti^t) where each vrj can be updated through an update 

function at) and F is independent of common agent’s policy p). We also say ^t+i = 

FiKoluat). 

Proof: 

We prove this by induction. Since 7ri(a;i) = Y{^=iQ\{Fi), the base case is verified. Now suppose TTt = 







16 


n*=i <■ Then, 


7ri+i(a;i+i) = {xt+i\ai:u'li,t+i) 


Ext n^i lii.<H)Q\i.x\+i\xl at) 

Extxt+i nil li{a\\x\)Q\{x\^^\xl at) 

^ ^x; K{A)li{4\A)Q\{A+i\xl at) 


2 = 1 

N 


Exi^K^D7Kat|5j) 


2 = 1 


(37a) 

(37b) 

(37c) 

(37d) 

(37e) 

(37f) 


where (37e) follows from induction hypothesis. It is assumed in (37c)-(37e) that the denominator is not 0. 
If denominator corresponding to any 7^ is zero, we define 


nt+i[Xt^P = 


'^^l(4)Ql(4+i\xlat), 


(38) 


where iit+i still satisfies (37f). Thus = F(7r), 7^*, a*) and = 7^(7:*, 7t, Ui) where F and F are 
appropriately defined from above. ■ 

Claim 6: (Hi, T*)457- is a controlled Markov process with state 11* and control action T* 

Proof: 


P'^{d'Kt+l\'Xl,t,'^^l■.t) = P^jd-Kt+l, at, Xt\Tll:t, 7l:t) 




N 


Y,P^ixt\7ri:t,li:t) < l[Yt{ai\xi) \ ^F{7Tt,'ytAt) ('^t+l) 




2 = 1 


N 


5^7r*(a;*) < Yljl{al\xl) ^ lF{nu^,,at){ 7 it+i) 


at.xt 


. i=l 


= F(d7r*+i|7r*,7*). 


(39a) 

(39b) 

(39c) 

(39d) 


For any given policy profile f, we construct policy profile 6 in the following way. 


Otid'ftlTTt) = P^id'-ftllTt). 


(40) 


Claim 7: 


P'^idiTt, d'ft, diTt+i) = P^idiTt, d'yt, dwt+i) Vt G T. 


(41) 


Proof: We prove this by induction. For f = 1, 

P’^(d7ri) = P^(d7ri) = /Q(7ri). 


( 42 ) 
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Now suppose P'^{d'Kt) = P^{d'Kt) is true for t, then 

P^{d7it,d-ft,dnt+,) = P^{d7it)P^{d-ft\nt)P^{dnt+i\nat) 
= P\d7lt)9t{d-ft\7lt)P{d7lt+l\7lt,lt) 

= P^{d7it,d'jt,d7it+i)- 


(43a) 

(43b) 

(43c) 


where (43b) is true from induction hypothesis, definition of 6 in ( [40| ) and since (nt,ri)ig 7 - is a controlled 
Markov process as proved in Claimand thus its update kernel does not depend on policy ip. This completes 
the induction step. ■ 

Claim 8: For any policy ip. 




(44) 


Proof: 


P'l’{xt,at\ai,t-i,'yt) = P^{xt\ai,t-i,^t)Wll{.a\\x\) 

(45a) 

i&N 


= T^t{xt)W-il{a\\x\) 

(45b) 

ieJV 


= P{xt,at\7it,'yt)- 

(45c) 


■ 

Finally, 


P^{xt,at)= ^ P'^ixt,at\ai.,t-i,'yt)P'^iai:t-i,'yt) 

(46a) 

= ^ P{xt,at\7it,7t)P'^{ai-.t-i,lt) 

(46b) 

= Y1 7t) 

(46c) 

= P{xt, OilvTi, 7t)P'^(7rt, 7 t) 

(46d) 

= P%xt,at). 

(46e) 

where (46b) follows from (44), (46c) is change of variable and (46d) from (|4T]). 

■ 


Appendix C 
(Proof of Theorem [T]) 

Proof: We prove (15) using induction and from results in Lemma and proved in Appendix [Dj 
For base case at f = T, Vz e M, {oi.t-i, x\.t) e IPf, /9* 

[R\XT,AT)\a^.,T.i,x\.^T] = Vf{^^[a^..T-f,xf) (47a) 

> [R\XT,AT)\ai,T-i,x\..r] . (47b) 

where (47a) follows from Lemma and (47b) follows from Lemma in Appendix [P} 
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Let the induction hypothesis be that for t + 1, Vi G TV, oi,* G x\,t+i G f5\ 


t+i 


]2V + l:T/^t + l:T’ /^*+l [“!:<] E R\Xn,An)\av.ux\ 

\ n=t+l 

> ]g/3Li,T/3VlV-Mt*+i[«i:t] I ^ R\Xn, An)\ai.,t, x\. 


t+1 


^ n=t-\-l 


Then Vi G TV, (ai:i_i,Xi.J G Rl, I3\ we have 

L n=t 

> E0W’-.rtl«.:.-.l A,) + V4,(^-^Ja„_,Al,-V'+l)hl:.-1.4:, 

= [R\Xt,At) + 


^ n=t-\-l 




^ n=t+l 


(48a) 

(48b) 


(49a) 

(49b) 

(49c) 

(49d) 


= J A) +E^‘VV*VV.*V.-i] J ^ i?*(X„, A„)|ai,i_i,^,xl,„X,Vi \ 

I ln=t+l J 

(49e) 

= |^i?^(X„, A„)|ai,i_i,xi^,| , (49f) 

where ( |49a| ) follows from Lemma ( |49b[ ) follows from Lemma [TJ ( |49c[ ) follows from Lemma ( |49d[ ) 
follows from induction hypothesis in ( |48b| ) and ( |49e| ) follows from Lemma Moreover, construction of 9 
in and consequently definition of (3* in ( [T3| ) are pivotal for (49e) to follow from (49d). 

We note that jX satisfies the consistency condition of [[^ p. 331] from the fact that (a) for all t and for 
every common history all players use the same belief /ij'[ai:t_i] on x* and (b) the belief can be 

factorized as /ij'[ai:t_i] = Y{^=i9^T[R--t-i] Vai:t-i G R1 where fi*/ is updated through Bayes’ rule (F) as 
in Claim in Appendix 


Appendix D 

Lemma 1: Vf G T, i G TV, (ai:t_i,x].J G Rl, (31 


t 

(50) 

Proof: We prove this Lemma by contradiction. 

Suppose the claim is not true for t. This implies 3i,/T^, such that 

(51) 
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We will show that this leads to a eontradietion. 

x\ = x\ 

arbitrary otherwise. 

Then for ai.,t-i.,x\.^ we have 


Construet %{a\\x\) = 




Jr:, 


= max ')■-1.).-V'+i)I*! 


(52a) 

(52b) 


> 






^h' "[av.t-i]{x^ ")f^{a\\x\)Pl' \a^ ^\ai,t-i,x^ *)QJ(x*+i|£*,a*) 

= 5^ {^*(£Ia;-*,ai) + l//+i(F(^*[ai,t_i],/3;(-|ai:t_i, •),«*), a:I+i)} x 

> f-/(/i:[ai,_i],i;i) 


(52c) 

(52d) 

(52e) 

(52f) 


where (52b I follows from definition of VI in (1%, l |52d l follows from definition of and (52f) follows from 
(Hg. However this leads to a contradiction. ■ 

Lemma 2: 'ii & M ,t ^T, {ai.t, x\.^i) G LLIj^i and f3l 




1:£+1 


^ n=t-\-l 


^ n=£+l 


(53) 


Thus the above quantities do not depend on (51. 

Proof: Essentially this claim stands on the fact that can be updated from 

and at, as A*’~\ ®i) in Claim 5 Since the above expectations involve 

random variables X(f^, At+i-.x, Xt+ 2 -.T, we consider at+i:T, Xt+ 2 :T\ai:u x\.^^^). 


Y.X-' at, Xt+l, Oj+i:T, Xt+2:T j:t-1, 4:t) 

We consider the numerator and the denominator separately. The numerator in ( |54a[ ) is given by 

pPl-.TPuT Mf*[ai:f-i] [at+i-,T, Xt+2-.T\ai,t, x\,^_^,Xf,t+i) 


(54a) 


(54b) 




(54c) 
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where ( 54e ) follows from the eonditional independenee of types given eommon information, as shown 
in Claim and the faet that probability on {at+i-,T,X 2 +f.T) given ai:t, Wi-.t-i] depends on 

ai:t, x\.^ Xt+i, through Similarly, the denominator in (54a) is given by 




By eaneeling the terms and Q*(-) in the numerator and the denominator, (54a) is given by 


■X 


Xt+2:T\ai-.t, xl^, Xt+l) 

= p/31+i:T/3:p:T./^r+l[“l^*l(2:-_^^, at+i:T, Xt+2-.TK.t, x{,t+^), 


(54f) 

(54g) 

(54h) 


where (54g) follows from using the definition of *) in the forward reeursive step in (14) and 

the definition of the belief update in (|J7]). 


Lemma 3: Vi G Af,t G T, (ai:t_i, G "HJ, 


T 

C/(/i:[ai;t_i],a:i) {yR^X^, An)\ai.,-i,x\, 




n=t 


(55) 


Proof: 

We prove the Lemma by induetion. For t = T, 


[R\XT,AT)\ai..T-l,xiT} 

= y R\xT,aT)pf[ai-T-i]{xf")/3f\af\ai-T-i,xf)/3^~\af"\ai-T-i,xf^) 


Xrp^aT 


— Vrp(^^[ai-T-i\,Xj^), 


(56a) 

(56b) 


where (56b) follows from the definition of in (j^ and the definition of jSf in the forward reeursion in 

(H^. 

Suppose the elaim is true for f + 1, i.e., \/i e AfR e T, (oi:*, x\.^R G HRi 


^tVl(^:+l [«!:*]> ^t+l) R\X^,ARy„x\.^ 


t-\-l 




(57) 
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Then e J\f,t e T, G HI, we have 

r T 


n=t 

^{R\Xt,At)+ 

= (X* , A*) + 

]E/^t+l:T^t + l;T’ Mt + 1 [“1:*-1 >"^*1 


^ ^ -R (^72:-^n) |^l:t—1;^I:t7 ^t+1 ^ |^l:t— 
n=t^l j 


(58a) 


R\Xn,An)\ai..t-uAt,x\,„XlA 
y^n=t+l } 


ai:t-i,x\.t[ (58b) 


= i^R^Xu A,) + V;_^,(iJ.‘Ja,..,.iA,],Xl,) 

T 7-z / * r. 1 ..i\ 


®l:t —15 ^l:t 


= (//(^*[ai:t_l],Xi), l^JOt 

where (|58b| ) follows from Lemma in Appendix ( |58e ) follows from the induetion hypothesis in 
(57), (58d) follows because the random variables involved in expectation, X^*^do not depen 

on /3*j-i-rA*4 -~i*T nnd (58e) follows from the definition of in the forward recursion in (13). the definitio 
of in (14) and th” —f /Irilv 


(58c) 

(58d) 

(58e) 

sis in 
not depend 
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